Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 839 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 39.5 KiB |
| Average record size in memory | 48.2 B |
Variable types
| Numeric | 6 |
|---|
Houses - median sale price ($) is highly correlated with INCOME_8 | High correlation |
INCOME_17 is highly correlated with INCOME_2 | High correlation |
INCOME_2 is highly correlated with INCOME_17 | High correlation |
INCOME_8 is highly correlated with Houses - median sale price ($) | High correlation |
Houses - median sale price ($) is highly correlated with INCOME_17 and 1 other fields | High correlation |
INCOME_17 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
INCOME_2 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
INCOME_17 is highly correlated with INCOME_2 | High correlation |
INCOME_2 is highly correlated with INCOME_17 | High correlation |
INCOME_8 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
Houses - median sale price ($) is highly correlated with INCOME_8 and 1 other fields | High correlation |
INCOME_17 is highly correlated with INCOME_8 and 1 other fields | High correlation |
INCOME_2 is highly correlated with Houses - median sale price ($) and 1 other fields | High correlation |
Houses - median sale price ($) has 68 (8.1%) zeros | Zeros |
INCOME_11 has 57 (6.8%) zeros | Zeros |
INCOME_8 has 46 (5.5%) zeros | Zeros |
INCOME_5 has 43 (5.1%) zeros | Zeros |
Reproduction
| Analysis started | 2021-08-18 06:15:16.494892 |
|---|---|
| Analysis finished | 2021-08-18 06:15:24.470179 |
| Duration | 7.98 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
Houses - median sale price ($)
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 489 |
|---|---|
| Distinct (%) | 58.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.387569658 × 10-17 |
| Minimum | -1.456628867 |
|---|---|
| Maximum | 9.478944059 |
| Zeros | 68 |
| Zeros (%) | 8.1% |
| Negative | 487 |
| Negative (%) | 58.0% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -1.456628867 |
|---|---|
| 5-th percentile | -1.001289798 |
| Q1 | -0.587273697 |
| median | -0.158315554 |
| Q3 | 0.2062988675 |
| 95-th percentile | 1.862077299 |
| Maximum | 9.478944059 |
| Range | 10.93557293 |
| Interquartile range (IQR) | 0.7935725645 |
Descriptive statistics
| Standard deviation | 1.000596481 |
|---|---|
| Coefficient of variation (CV) | 2.953729611 × 1016 |
| Kurtosis | 15.79000838 |
| Mean | 3.387569658 × 10-17 |
| Median Absolute Deviation (MAD) | 0.411084887 |
| Skewness | 2.993097448 |
| Sum | 2.842170943 × 10-14 |
| Variance | 1.001193317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 68 | 8.1% |
| -0.4442876493 | 9 | 1.1% |
| -0.5300792779 | 9 | 1.1% |
| -0.587273697 | 8 | 1.0% |
| -0.3727946255 | 8 | 1.0% |
| -0.02962811112 | 8 | 1.0% |
| -0.3298988112 | 8 | 1.0% |
| -0.5157806732 | 7 | 0.8% |
| -0.2441071826 | 7 | 0.8% |
| 0.1276565413 | 6 | 0.7% |
| Other values (479) | 701 |
| Value | Count | Frequency (%) |
| -1.456628867 | 1 | |
| -1.395144866 | 1 | |
| -1.362258075 | 1 | |
| -1.273606726 | 1 | |
| -1.245009516 | 1 | |
| -1.216412307 | 2 | |
| -1.202113702 | 1 | |
| -1.187815097 | 1 | |
| -1.173516492 | 1 | |
| -1.159217888 | 1 |
| Value | Count | Frequency (%) |
| 9.478944059 | 1 | |
| 6.476237058 | 1 | |
| 5.618320772 | 1 | |
| 5.189362629 | 1 | |
| 4.939137045 | 1 | |
| 4.517328205 | 1 | |
| 4.217057505 | 1 | |
| 4.188460295 | 1 | |
| 3.830995176 | 1 | |
| 3.702307733 | 1 |
| Distinct | 708 |
|---|---|
| Distinct (%) | 84.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.911976004 × 10-16 |
| Minimum | -2.508099021 |
|---|---|
| Maximum | 6.935083182 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 408 |
| Negative (%) | 48.6% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -2.508099021 |
|---|---|
| 5-th percentile | -1.469353797 |
| Q1 | -0.6125112686 |
| median | 8.765589081 × 10-16 |
| Q3 | 0.4419316309 |
| 95-th percentile | 1.860469056 |
| Maximum | 6.935083182 |
| Range | 9.443182203 |
| Interquartile range (IQR) | 1.0544429 |
Descriptive statistics
| Standard deviation | 1.000596481 |
|---|---|
| Coefficient of variation (CV) | 2.037054904 × 1015 |
| Kurtosis | 3.622916504 |
| Mean | 4.911976004 × 10-16 |
| Median Absolute Deviation (MAD) | 0.5489119559 |
| Skewness | 1.035654398 |
| Sum | 4.121147867 × 10-13 |
| Variance | 1.001193317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.765589081 × 10-16 | 118 | 14.1% |
| -1.677676295 | 2 | 0.2% |
| 0.3080857513 | 2 | 0.2% |
| -0.6660616678 | 2 | 0.2% |
| 1.76749978 | 2 | 0.2% |
| -0.7948476671 | 2 | 0.2% |
| 1.056948026 | 2 | 0.2% |
| 0.3845863216 | 2 | 0.2% |
| -0.3829493211 | 2 | 0.2% |
| 0.1062929085 | 2 | 0.2% |
| Other values (698) | 703 |
| Value | Count | Frequency (%) |
| -2.508099021 | 1 | |
| -2.298234464 | 1 | |
| -2.254743589 | 1 | |
| -2.212818867 | 1 | |
| -2.2122165 | 1 | |
| -2.192217926 | 1 | |
| -2.109573216 | 1 | |
| -2.065841393 | 1 | |
| -2.042951459 | 1 | |
| -2.042349092 | 1 |
| Value | Count | Frequency (%) |
| 6.935083182 | 1 | |
| 4.098779363 | 1 | |
| 3.965294903 | 1 | |
| 3.611103286 | 1 | |
| 3.379312582 | 1 | |
| 3.324497213 | 1 | |
| 3.285825271 | 1 | |
| 3.274982671 | 1 | |
| 3.175592166 | 1 | |
| 2.957174002 | 1 |
| Distinct | 731 |
|---|---|
| Distinct (%) | 87.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.016270897 × 10-16 |
| Minimum | -1.769952435 |
|---|---|
| Maximum | 4.097018625 |
| Zeros | 57 |
| Zeros (%) | 6.8% |
| Negative | 418 |
| Negative (%) | 49.8% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -1.769952435 |
|---|---|
| 5-th percentile | -1.406502766 |
| Q1 | -0.6603988005 |
| median | 0 |
| Q3 | 0.4519466962 |
| 95-th percentile | 2.013562008 |
| Maximum | 4.097018625 |
| Range | 5.86697106 |
| Interquartile range (IQR) | 1.112345497 |
Descriptive statistics
| Standard deviation | 1.000596481 |
|---|---|
| Coefficient of variation (CV) | 9.84576537 × 1015 |
| Kurtosis | 1.774099833 |
| Mean | 1.016270897 × 10-16 |
| Median Absolute Deviation (MAD) | 0.5569307033 |
| Skewness | 1.02821267 |
| Sum | 8.526512829 × 10-14 |
| Variance | 1.001193317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 57 | 6.8% |
| -1.406502766 | 35 | 4.2% |
| -1.406671969 | 3 | 0.4% |
| -1.396688948 | 2 | 0.2% |
| -0.356085844 | 2 | 0.2% |
| 0.7508451324 | 2 | 0.2% |
| 0.07199966045 | 2 | 0.2% |
| 0.2855348014 | 2 | 0.2% |
| 0.2848579864 | 2 | 0.2% |
| 0.2143000198 | 2 | 0.2% |
| Other values (721) | 730 |
| Value | Count | Frequency (%) |
| -1.769952435 | 1 | |
| -1.706162619 | 1 | |
| -1.679936037 | 1 | |
| -1.61479259 | 1 | |
| -1.606163199 | 1 | |
| -1.549310736 | 1 | |
| -1.465047266 | 1 | |
| -1.446604056 | 1 | |
| -1.433406163 | 1 | |
| -1.426130402 | 1 |
| Value | Count | Frequency (%) |
| 4.097018625 | 1 | |
| 3.955564285 | 1 | |
| 3.868762757 | 1 | |
| 3.739998699 | 1 | |
| 3.440846457 | 1 | |
| 3.411235799 | 1 | |
| 3.389746922 | 1 | |
| 3.375364603 | 1 | |
| 3.320204178 | 1 | |
| 3.298546097 | 1 |
| Distinct | 737 |
|---|---|
| Distinct (%) | 87.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -5.166043728 × 10-16 |
| Minimum | -2.821009014 |
|---|---|
| Maximum | 7.127331837 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 502 |
| Negative (%) | 59.8% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -2.821009014 |
|---|---|
| 5-th percentile | -1.474022022 |
| Q1 | -0.6328614945 |
| median | -8.822869829 × 10-16 |
| Q3 | 0.5060786872 |
| 95-th percentile | 1.777180796 |
| Maximum | 7.127331837 |
| Range | 9.948340851 |
| Interquartile range (IQR) | 1.138940182 |
Descriptive statistics
| Standard deviation | 1.000596481 |
|---|---|
| Coefficient of variation (CV) | -1.936871876 × 1015 |
| Kurtosis | 3.776971607 |
| Mean | -5.166043728 × 10-16 |
| Median Absolute Deviation (MAD) | 0.5881114828 |
| Skewness | 0.9307780909 |
| Sum | -4.334310688 × 10-13 |
| Variance | 1.001193317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -8.822869829 × 10-16 | 89 | 10.6% |
| 0.5881114828 | 4 | 0.5% |
| -0.2607127143 | 2 | 0.2% |
| -0.5419160448 | 2 | 0.2% |
| -0.01613008496 | 2 | 0.2% |
| -0.7801931229 | 2 | 0.2% |
| -0.305942918 | 2 | 0.2% |
| -0.07397139096 | 2 | 0.2% |
| 1.287542621 | 2 | 0.2% |
| 1.302336414 | 2 | 0.2% |
| Other values (727) | 730 |
| Value | Count | Frequency (%) |
| -2.821009014 | 1 | |
| -2.545868714 | 1 | |
| -2.42776089 | 1 | |
| -2.309168023 | 1 | |
| -2.295223054 | 1 | |
| -2.206096513 | 1 | |
| -2.131399984 | 1 | |
| -2.064949176 | 1 | |
| -2.062645224 | 1 | |
| -2.054399503 | 1 |
| Value | Count | Frequency (%) |
| 7.127331837 | 1 | |
| 4.38756985 | 1 | |
| 3.831347479 | 1 | |
| 3.483814601 | 1 | |
| 3.397840836 | 1 | |
| 3.318536404 | 1 | |
| 2.859322513 | 1 | |
| 2.835797957 | 1 | |
| 2.715264921 | 1 | |
| 2.658878742 | 1 |
| Distinct | 485 |
|---|---|
| Distinct (%) | 57.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -4.234462072 × 10-17 |
| Minimum | -0.6782751478 |
|---|---|
| Maximum | 13.35409414 |
| Zeros | 46 |
| Zeros (%) | 5.5% |
| Negative | 539 |
| Negative (%) | 64.2% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -0.6782751478 |
|---|---|
| 5-th percentile | -0.5913414091 |
| Q1 | -0.4229393142 |
| median | -0.2103017805 |
| Q3 | 0.07492286304 |
| 95-th percentile | 1.08465226 |
| Maximum | 13.35409414 |
| Range | 14.03236929 |
| Interquartile range (IQR) | 0.4978621772 |
Descriptive statistics
| Standard deviation | 1.000596481 |
|---|---|
| Coefficient of variation (CV) | -2.362983689 × 1016 |
| Kurtosis | 85.35501131 |
| Mean | -4.234462072 × 10-17 |
| Median Absolute Deviation (MAD) | 0.2356946156 |
| Skewness | 7.855874294 |
| Sum | -3.552713679 × 10-14 |
| Variance | 1.001193317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 46 | 5.5% |
| -0.3127777004 | 6 | 0.7% |
| -0.512605744 | 5 | 0.6% |
| -0.3230252923 | 5 | 0.6% |
| -0.4374567362 | 5 | 0.6% |
| -0.3759711843 | 5 | 0.6% |
| -0.3674315243 | 5 | 0.6% |
| -0.4886946961 | 5 | 0.6% |
| -0.4340408722 | 4 | 0.5% |
| -0.3913425722 | 4 | 0.5% |
| Other values (475) | 749 |
| Value | Count | Frequency (%) |
| -0.6782751478 | 1 | 0.1% |
| -0.6611958278 | 2 | |
| -0.6577799638 | 1 | 0.1% |
| -0.6509482358 | 1 | 0.1% |
| -0.6424085758 | 1 | 0.1% |
| -0.6338689159 | 3 | |
| -0.6304530519 | 1 | 0.1% |
| -0.6287451199 | 2 | |
| -0.6270371879 | 1 | 0.1% |
| -0.6236213239 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 13.35409414 | 1 | |
| 12.7221593 | 1 | |
| 9.844293888 | 1 | |
| 7.492471527 | 1 | |
| 5.506146615 | 1 | |
| 4.688047188 | 1 | |
| 4.660720276 | 1 | |
| 4.281559372 | 1 | |
| 4.088563057 | 1 | |
| 3.442964762 | 1 |
| Distinct | 770 |
|---|---|
| Distinct (%) | 91.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -3.387569658 × 10-17 |
| Minimum | -3.032266851 |
|---|---|
| Maximum | 7.593931862 |
| Zeros | 43 |
| Zeros (%) | 5.1% |
| Negative | 394 |
| Negative (%) | 47.0% |
| Memory size | 6.7 KiB |
Quantile statistics
| Minimum | -3.032266851 |
|---|---|
| 5-th percentile | -1.55619789 |
| Q1 | -0.5058618496 |
| median | 0 |
| Q3 | 0.4648576138 |
| 95-th percentile | 1.406602762 |
| Maximum | 7.593931862 |
| Range | 10.62619871 |
| Interquartile range (IQR) | 0.9707194635 |
Descriptive statistics
| Standard deviation | 1.000596481 |
|---|---|
| Coefficient of variation (CV) | -2.953729611 × 1016 |
| Kurtosis | 9.692946691 |
| Mean | -3.387569658 × 10-17 |
| Median Absolute Deviation (MAD) | 0.4887174713 |
| Skewness | 1.506883529 |
| Sum | -2.842170943 × 10-14 |
| Variance | 1.001193317 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 43 | 5.1% |
| 0.154222819 | 2 | 0.2% |
| -0.358364549 | 2 | 0.2% |
| -0.07044799174 | 2 | 0.2% |
| 0.2914455399 | 2 | 0.2% |
| 0.5083533348 | 2 | 0.2% |
| -0.7195731087 | 2 | 0.2% |
| 0.6825645428 | 2 | 0.2% |
| -0.2375811559 | 2 | 0.2% |
| 0.1060464562 | 2 | 0.2% |
| Other values (760) | 778 |
| Value | Count | Frequency (%) |
| -3.032266851 | 1 | |
| -2.842757935 | 1 | |
| -2.598222937 | 1 | |
| -2.553699758 | 1 | |
| -2.541598586 | 1 | |
| -2.468763232 | 1 | |
| -2.399809386 | 1 | |
| -2.371268886 | 1 | |
| -2.357797771 | 1 | |
| -2.329942243 | 1 |
| Value | Count | Frequency (%) |
| 7.593931862 | 1 | |
| 6.611910361 | 1 | |
| 5.879218662 | 1 | |
| 5.095610712 | 1 | |
| 5.049717589 | 1 | |
| 4.76454092 | 1 | |
| 4.230947744 | 1 | |
| 3.816768018 | 1 | |
| 3.746672552 | 1 | |
| 2.560757723 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Houses - median sale price ($) | INCOME_17 | INCOME_11 | INCOME_2 | INCOME_8 | INCOME_5 | |
|---|---|---|---|---|---|---|
| 0 | 0.000000 | 2.421670 | 2.058283 | 2.458071e+00 | -0.391343 | -0.067023 |
| 1 | 0.000000 | -0.627269 | 0.684010 | -6.750602e-01 | 0.039056 | -0.299229 |
| 2 | 0.000000 | -1.154461 | 1.180284 | -1.554685e+00 | -0.153940 | -0.493989 |
| 3 | -0.444288 | -1.575033 | 1.163702 | -1.825338e+00 | 0.000000 | 0.696036 |
| 4 | -0.575835 | -0.894118 | 0.675211 | -9.194003e-01 | 0.030517 | 0.127966 |
| 5 | -0.494333 | -1.570937 | 1.586373 | -1.149917e+00 | 0.025393 | -1.071649 |
| 6 | -0.620160 | -1.332882 | 0.677411 | -1.499996e+00 | 0.180815 | 0.023165 |
| 7 | -0.391383 | -1.524675 | 1.460486 | -1.553472e+00 | 0.132993 | -0.259500 |
| 8 | -0.258406 | -1.327581 | 1.361501 | -1.161315e+00 | 0.000000 | 0.465657 |
| 9 | -0.465021 | -2.042349 | 1.572668 | -8.822870e-16 | 0.592426 | -0.289639 |
Last rows
| Houses - median sale price ($) | INCOME_17 | INCOME_11 | INCOME_2 | INCOME_8 | INCOME_5 | |
|---|---|---|---|---|---|---|
| 829 | 0.413629 | 1.990255 | 2.806671 | 2.859323e+00 | 0.223513 | -0.395810 |
| 830 | 0.174842 | 1.293558 | 2.630360 | 1.610823e+00 | -0.005350 | -1.061831 |
| 831 | 0.549680 | 2.137594 | 2.972660 | -8.822870e-16 | 0.575347 | -1.189464 |
| 832 | 0.585212 | 1.629919 | 0.000000 | 1.940773e+00 | 0.563391 | -0.017933 |
| 833 | 1.099962 | 1.375359 | 3.298546 | 1.961630e+00 | 1.296094 | 0.000000 |
| 834 | 0.234896 | 0.553129 | 2.581630 | 8.590077e-01 | -0.046340 | -1.872381 |
| 835 | 2.222402 | 3.611103 | 3.739999 | -8.822870e-16 | 4.688047 | 5.879219 |
| 836 | 0.442226 | 1.630401 | 0.000000 | 2.085195e+00 | 0.435297 | -0.636007 |
| 837 | 0.000000 | 2.361192 | 1.705154 | 2.615225e+00 | -0.328149 | -1.855028 |
| 838 | 0.000000 | 3.285825 | 1.588065 | 3.483815e+00 | -0.435749 | -1.004293 |